Workshop 2: An introduction to coding in R

Tom Keaney

Recapping yesterday

The Holy trinity:

  • select()
  • filter()
  • mutate()

Build your phenotypic classes

  • Not every individual has a recorded wing length.

  • But there are other morphological traits in the dataset

  • Create a classification criteria and implement it

Phenotypic classes

pterosaur_data_classes <-
  pterosaur_data %>% 
  mutate(single_wing_length = 
           HUMERUS + RADIUS + METACARPAL_4 + WING_PHALANX_1 + 
           WING_PHALANX_2 + WING_PHALANX_3 + WING_PHALANX_4) %>% 
  mutate(Size_class = case_when(single_wing_length < 300 ~ "Small", 
                               single_wing_length >= 300 ~ "Large", 
                               WING_PHALANX_1 < 80 ~ "Small",
                               WING_PHALANX_1 >= 80 ~ "Large",
                               FEMUR < 20 ~ "Small",
                               FEMUR >= 20 ~ "Large",
                               TAIL < 150 ~ "Small",
                               TAIL >= 200 ~ "Large",
                               SKULL < 80 ~ "Small",
                               SKULL >= 100 ~ "Large",
                               HUMERUS < 25 ~ "Small",
                               HUMERUS >= 30 ~ "Large",
                               SKULL < 90 ~ "Small",
                               SKULL >= 90 ~ "Large",
                               .default = "Unknown")) %>% 
  select(Individual_ID, Size_class, single_wing_length, everything())

Bonus content: summarise()

  • The logic of mutate() can be extended to summarise() row values
  • Rows can be grouped to summarise conditionally using the group_by function
pterosaur_data_classes %>% 
  group_by(Size_class) %>% 
  summarise("Wing length" = mean(single_wing_length, na.rm = T))
1
mean() has a built-in way to deal with NA values
# A tibble: 3 × 2
  Size_class `Wing length`
  <chr>              <dbl>
1 Large               490.
2 Small               197.
3 Unknown             NaN 

Expanding our vocabulary

  • pivot_longer()

  • slice_sample()

  • n()

Your task

  1. Split pterosaurs into phenotypic classes and remove those you can’t categorise
  2. Trim the dataframe to only include the relevant measures
  3. Summarise the data to show the mean for your chosen morphological traits, for each class
  4. Convert to cm and round to zero decimal places

Focus on writing clear code, with comments (using the #) accompanying each important step.

Hint: the round() function can be used inside mutate()

Table making

Once complete, pass your polished dataframe to this function with the %>% to make a neat table

# your dataframe goes here %>% 
 pander(split.cell = 20, split.table = Inf)
1
See ?pander

Joins

What if we have two separate dataframes that we want to merge?

five_random_pterosaurs <- pterosaur_data_classes %>% 
  filter_at(vars(ORBIT, TAIL), all_vars(!is.na(.)))  %>% 
  slice_sample(n = 8)

eye_stats <- 
  five_random_pterosaurs %>%
  slice_sample(n = 5) %>% 
  select(Individual_ID, ORBIT) %>% 
  arrange(Individual_ID)

tail_stats <- 
  five_random_pterosaurs %>%
  slice_sample(n = 5) %>% 
  select(Individual_ID, TAIL) %>% 
  arrange(Individual_ID)

Joins

eye_stats
# A tibble: 5 × 2
  Individual_ID ORBIT
          <dbl> <dbl>
1            11    12
2            24    14
3            44    23
4            67    21
5           118    18
tail_stats
# A tibble: 5 × 2
  Individual_ID  TAIL
          <dbl> <dbl>
1            26   163
2            44   277
3            67   262
4           118   259
5           136   350

For joins to work, there needs to be some common element that links the two dataframes

left_join()

  • Add columns from dataframe y to dataframe x
  • The comment element is the Individual_ID
  • Keep all observations in x
eye_stats %>% 
  left_join(tail_stats)
# A tibble: 5 × 3
  Individual_ID ORBIT  TAIL
          <dbl> <dbl> <dbl>
1            11    12    NA
2            24    14    NA
3            44    23   277
4            67    21   262
5           118    18   259
tail_stats %>% 
  left_join(eye_stats)
# A tibble: 5 × 3
  Individual_ID  TAIL ORBIT
          <dbl> <dbl> <dbl>
1            26   163    NA
2            44   277    23
3            67   262    21
4           118   259    18
5           136   350    NA

inner_join()

  • Only keep rows in x that have a matching common element in y
eye_stats %>% 
  inner_join(tail_stats)
# A tibble: 3 × 3
  Individual_ID ORBIT  TAIL
          <dbl> <dbl> <dbl>
1            44    23   277
2            67    21   262
3           118    18   259
tail_stats %>% 
  inner_join(eye_stats)
# A tibble: 3 × 3
  Individual_ID  TAIL ORBIT
          <dbl> <dbl> <dbl>
1            44   277    23
2            67   262    21
3           118   259    18

Visualising data

  • At its core, science communication is most effective through visual mediums

  • The ggplot2 package is included in the tidyverse

ggplot()

  • Build plots one layer at a time

  • Layers are added on top of one another

  • New layers are added with the + symbol

  • + == %>% in ggplot-land

Getting started

pterosaur_data_classes %>% 
  ggplot(aes())
  • ggplot() provides an empty canvas
  • aes determines how variables are mapped to visual aesthetics

Building a geom_histogram()

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) +
  geom_histogram(binwidth = 0.1) 

Fix the labels

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) +
  geom_histogram(binwidth = 0.1) +
  labs(x = "Skull length (cm)", y = "No. individuals", fill = "Size class")

Fix the theming

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) +
  geom_histogram(binwidth = 0.1) +
  labs(x = "Skull length (cm)", y = "No. individuals", fill = "Size class") +
  theme_classic() + # new
  theme(panel.grid.major = element_line(), # new
        text = element_text(size= 14)) # new

Fix the axis

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) +
  geom_histogram(binwidth = 0.1) +
  labs(x = "Skull length (cm)", y = "No. individuals", fill = "Size class") +
  scale_x_continuous(expand = c(0, 0), # new
                     breaks = c(0, 4.0, 8.0, 12.0, 16.0, 20.0), # new
                     limits = c(0, 20.0)) + # new
  scale_y_continuous(expand = c(0, 0)) + # new
  theme_classic() + 
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

Change the colours

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) +
  geom_histogram(binwidth = 0.1) +
  labs(x = "Skull length (cm)", y = "No. individuals", fill = "Size class") +
  scale_x_continuous(expand = c(0, 0),
                     breaks = c(0, 4.0, 8.0, 12.0, 16.0, 20.0),
                     limits = c(0, 20.0)) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_fill_manual(values = c(met.brewer("Monet")[2], met.brewer("Monet")[8])) + # new
  theme_classic() +
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

Change to geom_density()

pterosaur_data_classes %>% 
  ggplot(aes(x = SKULL/10, fill = Size_class)) + 
  geom_density(colour = NA, alpha = 0.7) + # new
  labs(x = "Skull length (cm)", y = "No. individuals", fill = "Size class") +
  scale_fill_manual(values = c(met.brewer("Monet")[2], met.brewer("Monet")[8])) +
  scale_x_continuous(expand = c(0, 0),
                     breaks = c(0, 4.0, 8.0, 12.0, 16.0, 20.0),
                     limits = c(0, 20.0)) +
  scale_y_continuous(expand = c(0, 0)) +
  theme_classic() +
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

Joyplots

# use pivot_longer to get all traits into single column

pterosaur_data_classes %>% 
  select(!c(single_wing_length)) %>% 
  pivot_longer(cols = ORBIT:TIBIA, 
               names_to = "Trait",
               values_to = "Length") %>%
  group_by(Trait) %>% 
  mutate(Mean_value = mean(Length, na.rm = T)) %>% 
  ungroup() %>% 
  
  # plot
  ggplot(aes(x = Length, y = fct_reorder(Trait, Mean_value))) +
  geom_density_ridges(alpha = 0.5, scale = 3, linewidth = 0,
                      fill = "#05595B", color = NA) +
  scale_x_continuous(limits = c(0, NA), expand = c(0, 0)) +
  labs(y = "Trait", x = "Length (mm)") +
  theme_minimal() +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 13),
        panel.grid.major.y = element_line(size = 0.5))

Joyplots

The scatterplot: geom_point()

pterosaur_data_classes %>%
  ggplot(aes(x = ORBIT, y = WING_PHALANX_1)) +
  geom_point()

Make improvements

pterosaur_data_classes %>%
  ggplot(aes(x = ORBIT/10, y = WING_PHALANX_1/10)) +
  geom_point() +
  labs(x = "Orbit length (cm)", y = "First wing phalanx\nlength (cm)") +
  scale_fill_manual(values = c(met.brewer("Monet")[2], met.brewer("Monet")[8])) +
  scale_x_continuous(expand = c(0, 0), limits = c(0, 5)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 22)) +
  theme_classic() +
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

geom_point()

pterosaur_data_classes %>% filter(Size_class != "Unknown") %>% 
  ggplot(aes(x = ORBIT/10, y = WING_PHALANX_1/10)) +
  geom_point(aes(fill = Size_class), shape = 21, size = 5, alpha = 0.8, 
             colour = "black") +
  labs(x = "Orbit length (cm)", y = "First wing phalanx\nlength (cm)", 
       fill = "Size class") +
  scale_fill_manual(values = c(met.brewer("Hiroshige")[4], 
                               met.brewer("Hiroshige")[6])) +
  scale_x_continuous(expand = c(0, 0), limits = c(0, 5)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 22)) +
  theme_classic() +
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

Continuous vs discrete colours

pterosaur_data_classes %>% filter(Size_class != "Unknown") %>% 
  ggplot(aes(x = ORBIT/10, y = WING_PHALANX_1/10)) +
  geom_point(aes(fill = ORBIT/10), shape = 21, size = 5, alpha = 0.9, 
             colour = "black") +
  labs(x = "Orbit length (cm)", y = "First wing phalanx\nlength (cm)", 
       fill = "Orbit\nlength") +
  scale_fill_gradientn(colors=met.brewer("Hiroshige", direction = -1)) +
  scale_x_continuous(expand = c(0, 0), limits = c(0, 5)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 22)) +
  theme_classic() +
  theme(panel.grid.major = element_line(),
        text = element_text(size= 14))

General tips

  • alpha: changes the transparency

  • fill: colours the inside of elements

  • colour: colours the outlines of elements

If you want these to change with your data, place them inside aes()

Many more plot styles can be found here

Building a hypothesis

With all of the skills you now possess, outline your hypothesis for why there are discrete size classes in the Rhamphorhynchus dataset. Write the report up in quarto, making sure to use visual summaries of the data to support your hypothesis.

Make sure to:

  • Tidy your code

  • Use comments within code chunks

  • Write explanations outside of code chunks

  • See the _quarto.yml file

  • Quarto html editing

The power of Quarto

Tom’s supplementary material